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BACKGROUND: 

The  inflammatory  bowel  diseases  (IBD)  Crohn's  disease  and  ulcerative  colitis  result  from  alterations  in  intestinal 
microbes  and  the  immune  system.  However,  the  precise  dysfunctions  of  microbial  metabolism  in  the  gastrointestinal 
microbiome  during  IBD  remain  unclear.  We  analyzed  the  microbiota  of  intestinal  biopsies  and  stool  samples  from 
231  IBD  and  healthy  subjects  by  16S  gene  pyrosequencing  and  followed  up  a  subset  using  shotgun  metagenomics. 
Gene  and  pathway  composition  were  assessed,  based  on  16S  data  from  phylogenetically-related  reference  genomes, 
and  associated  using  sparse  multivariate  linear  modeling  with  medications,  environmental  factors,  and  IBD  status. 
RESULTS: 

Firmicutes  and  Enterobacteriaceae  abundances  were  associated  with  disease  status  as  expected,  but  also  with 
treatment  and  subject  characteristics.  Microbial  function,  though,  was  more  consistently  perturbed  than  composition, 
with  12%  of  analyzed  pathways  changed  compared  with  2%  of  genera.  We  identified  major  shifts  in  oxidative  stress 
pathways,  as  well  as  decreased  carbohydrate  metabolism  and  amino  acid  biosynthesis  in  favor  of  nutrient  transport 
and  uptake.  The  microbiome  of  ileal  Crohn's  disease  was  notable  for  increases  in  virulence  and  secretion  pathways. 
CONCLUSIONS: 

This  inferred  functional  metagenomic  information  provides  the  first  insights  into  community-wide  microbial 
processes  and  pathways  that  underpin  IBD  pathogenesis. 
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Abstract 

Background:  The  inflammatory  bowel  diseases  (IBD)  Crohn's  disease  and  ulcerative  colitis  result  from  alterations  in 
intestinal  microbes  and  the  immune  system.  However,  the  precise  dysfunctions  of  microbial  metabolism  in  the 
gastrointestinal  microbiome  during  IBD  remain  unclear.  We  analyzed  the  microbiota  of  intestinal  biopsies  and  stool 
samples  from  231  IBD  and  healthy  subjects  by  165  gene  pyrosequencing  and  followed  up  a  subset  using  shotgun 
metagenomics.  Gene  and  pathway  composition  were  assessed,  based  on  165  data  from  phylogenetically-related 
reference  genomes,  and  associated  using  sparse  multivariate  linear  modeling  with  medications,  environmental 
factors,  and  IBD  status. 

Results:  Firmicutes  and  Enterobacteriaceae  abundances  were  associated  with  disease  status  as  expected,  but  also 
with  treatment  and  subject  characteristics.  Microbial  function,  though,  was  more  consistently  perturbed  than 
composition,  with  12%  of  analyzed  pathways  changed  compared  with  2%  of  genera.  We  identified  major  shifts  in 
oxidative  stress  pathways,  as  well  as  decreased  carbohydrate  metabolism  and  amino  acid  biosynthesis  in  favor  of 
nutrient  transport  and  uptake.  The  microbiome  of  ileal  Crohn's  disease  was  notable  for  increases  in  virulence  and 
secretion  pathways. 

Conclusions:  This  inferred  functional  metagenomic  information  provides  the  first  insights  into  community-wide 
microbial  processes  and  pathways  that  underpin  IBD  pathogenesis. 


Background 

Inflammatory  bowel  disease  (IBD),  a  chronic  and  relapsing 
inflammatory  condition  of  the  gastrointestinal  (GI)  tract, 
is  intimately  linked  to  the  microbial  communities  of  the 
human  gut.  Although  it  is  now  widely  accepted  that  IBD 
results  from  altered  interactions  between  gut  microbes 
and  the  intestinal  immune  system,  the  precise  nature  of 
the  intestinal  microbiota  dysfunction  in  IBD  remains  to  be 
elucidated  [1].  IBD  further  includes  two  main  subtypes, 
ulcerative  colitis  (UC)  and  Crohn’s  disease  (CD),  which 
each  include  distinct  microbial  perturbations  and  tissue 
localizations.  The  former  is  confined  to  the  colon,  while 
the  latter  may  affect  any  part  of  the  digestive  tract,  with 
unclear  implications  for  microbial  involvement  or  causality 
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[2].  In  particular,  the  microbial  mechanisms  and  metabo¬ 
lism  underlying  the  role  of  the  GI  microbiome  in  IBD 
onset  and  its  alteration  in  the  course  of  active  treatment 
and  recovery  are  stiU  unknown. 

In  the  last  decade,  advances  in  DNA  sequencing  have 
allowed  exploration  of  the  40%  of  the  gut  microbiome 
that  is  still  uncultured  [3],  setting  the  stage  for  investiga¬ 
tion  of  the  IBD  microbiome.  The  GI  microbiome  of 
healthy  humans  is  dominated  by  four  major  bacterial 
phyla:  Firmicutes,  Bacteroidetes,  and  to  a  lesser  degree 
Proteobacteria  and  Actinobacteria  [4,5].  Many  studies 
have  observed  imbalances  or  dysbioses  in  the  GI  micro- 
biomes  of  IBD  patients  [6-13];  in  both  GD  and  UC 
patients,  there  is  decreased  biodiversity,  a  lower  propor¬ 
tion  of  Firmicutes,  and  an  increase  in  Gammaproteobac- 
teria  [14].  In  CD,  proportions  of  the  Clostridia  are 
altered:  the  Roseburia  and  Faecalibacterium  genera  of 
the  Lachnospiracae  and  Ruminococcaceae  families  are 
decreased,  whereas  Ruminococcus  gnavus  increases 
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[15-17].  Specific  features  of  UC-associated  dysbiosis  are 
less  described,  although  increased  sulfate-reducing  Delta- 
proteobacteria  have  been  reported  [18,19].  These  studies 
have  described  typical  changes  in  composition  of  the 
IBD  gut  community,  but  the  functional  roles  of  these 
organisms  -  or  of  the  entirety  of  a  dysbiotic  community  - 
remain  less  clear. 

The  normal  gut  microbiome  exhibits  tremendous  func¬ 
tional  diversity  encoded  by  a  collection  of  bacterial  genes 
numbering  more  than  100  times  the  human  gene  set 
[4,20].  Thus,  the  genomic  potential  of  the  human  micro¬ 
biome  is  far  greater  than  that  of  its  host,  and  treatments, 
diets,  or  medications  that  affect  the  host  will  also  likely 
affect  the  microbiome.  A  primary  example  of  the  impor¬ 
tance  of  the  microbiome  to  host  health  is  in  the  digestion 
of  dietary  fiber,  which  is  used  by  the  microbiota  of  the 
lower  GI  tract  as  their  main  source  of  energy  [21].  Fibroly- 
tic  bacteria  degrade  polysaccharides  into  smaller  carbohy¬ 
drates,  which  are  then  fermented  into  short-chain  fatty 
acids  (SCFAs)  such  as  acetate,  propionate,  and  butyrate. 
Butyrate  in  particular  is  a  major  source  of  energy  for  colo- 
nocytes,  but  all  three  of  these  have  demonstrated  immu¬ 
nomodulatory  properties  [22-27].  In  addition  to  these 
metabolic  functions,  many  genetic  studies  in  IBD  have 
highlighted  the  central  role  of  host-microbe  interactions  in 
IBD  pathogenesis  [1,28-30].  Specific  host  pathways  linked 
to  microbial  response  in  IBD  include  T-cell  activation,  the 
IL-23/T  helper  17  pathway,  autophagy  [31],  and  Paneth 
cell  function  [32].  Together,  these  results  support  the  cen¬ 
trality  of  host-microbiota  crosstalk  for  gut  homeostasis 
and  in  turn  the  role  of  dysfunctional  crosstalk  between  the 
host  and  GI  microbiome  in  IBD. 

Little  work  has  yet  bridged  the  gap  between  IBD  patho¬ 
genesis  in  a  human  host,  individual  microbes,  and  altera¬ 
tions  in  metabolism  of  the  GI  microbial  community  in 
IBD.  Few  studies  of  the  IBD  gut  microbiome  have  investi¬ 
gated  microbiome  function  [33],  and  these  have  not  sys¬ 
tematically  accounted  for  the  influences  of  treatments 
and  environmental  factors.  We  have  thus  analyzed  the  GI 
microbiomes  of  121  CD  patients,  75  UC  patients,  and  27 
healthy  controls  using  a  novel  multivariate  metagenomic 
analysis  pipeline  specifically  accounting  for  environmen¬ 
tal  factors  (including  treatment,  age,  and  tobacco  use).  In 
addition  to  assessing  microbiome  composition,  we  have 
analyzed  the  inferred  metagenome  as  determined  from 
phylogenetically-associated  reference  genomes,  including 
metabolic  modules  and  pathways  also  associated  with 
disease  status  and  with  environmental  factors  such  as 
medications  and  smoking.  Not  only  were  these  GI  micro¬ 
biomes  characterized  by  shifts  in  bacterial  populations 
during  disease  as  previously  described,  but  these 
dysbioses  were  highly  functionally  coordinated.  Cross¬ 
species  enrichments  included  mucin  metabolism  and 
redox  tolerance  by  means  of  glutathione  transport. 


cysteine  biosynthesis,  and  riboflavin  metabolism.  Conver¬ 
sely,  processes  linked  broadly  to  clades  IV  and  XlVa 
Clostridia  were  depleted,  particularly  short  chain  fatty 
acid  production.  Dysbioses  in  IBD  are  correspondingly 
not  simply  structural  changes  in  the  gut  microbiota,  but 
are  instead  associated  with  major  impairments  in  many 
fundamental  microbial  metabolic  functions  with  potential 
impact  on  the  host. 

Results 

In  order  to  measure  compositional  and  functional  differ¬ 
ences  between  the  gut  microbiota  of  healthy  and  IBD- 
affected  individuals,  231  fecal  and  biopsy  samples  were 
collected  from  the  Ocean  State  Crohn’s  and  Colitis  Area 
Registry  (OSCCAR)  and  the  Prospective  Registry  in  IBD 
Study  at  MGH  (PRISM)  database.  OSCCAR  is  a  state- 
based,  prospective  inception  IBD  cohort,  and  PRISM  is  a 
referral  center-based,  prospective  IBD  cohort  (see  Materi¬ 
als  and  methods).  The  samples  comprised  136  fecal  speci¬ 
mens  and  95  colon  or  small  intestinal  biopsies,  originating 
from  a  cross-section  of  121  CD  patients,  75  UC  patients, 
27  healthy  controls,  and  8  indeterminate  (Table  1). 
In  addition  to  general  information  such  as  gender  and  age, 
data  regarding  disease  characteristics  (topography,  disease 
activity  as  measured  by  the  Harvey-Bradshaw  Index  (HBI) 
and  the  Simple  Colitis  Activity  Index),  treatment  (antibio¬ 
tics,  corticosteroids,  mesalamine,  immunosuppressant), 
and  environmental  exposure  (tobacco  use)  were  collected 
from  each  subject  and  analyzed.  DNA  was  extracted  from 
fecal  samples  and  biopsies,  and  the  16S  rRNA  gene  was 
amplified  and  sequenced  using  454  technology.  The  result¬ 
ing  sequences  were  then  processed  using  a  specific  in  silico 
pipeline  for  sequence  cleaning  and  phylotype  assignment 
(see  Materials  and  methods).  At  the  end  of  this  process, 
the  average  sequencing  depth  was  2,860  reads  per  sample. 
These  data  were  first  validated  by  comparison  with  pre¬ 
vious  work,  recapitulating  previously  observed  changes  in 
microbial  community  composition  during  IBD  and  attri¬ 
buting  several  to  host  treatment  or  environment.  They 
were  subsequently  associated  with  reference  genomes  in 
order  to  discover  disease-associated  modulations  of  micro¬ 
bial  function  and  metabolism.  A  subset  of  II  samples 
(7  healthy,  4  CD)  were  subjected  to  whole-genome  shot¬ 
gun  sequencing  using  the  Illumina  MiSeq  platform  at  an 
average  depth  of  1 19  meganucleotides  per  sample  in  order 
to  confirm  these  functional  inferences. 

Assessing  significant  covariation  of  microbiome  structure 
with  host  iBD  status,  treatment,  and  environment 

We  used  a  sparse  multivariate  statistical  approach  to  relate 
disease  phenotype  to  microbiome  structure  and  function 
while  accounting  for  potential  correlates  and  confounding 
factors  such  as  treatment  or  smoking.  Metadata  features 
potentially  associated  with  each  clade  were  first  selected 
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Table  1  Characteristics  of  patients  in  this  study 


CD 

UC 

HS 

Indeterminate 

n 

121 

75 

27 

8 

Female  gender  (n) 

59.5%  (72) 

49.3%  (37) 

55.6%  (15) 

62.5%  (5) 

Age  (lower  95%-upper  95%) 

37.3  (34.3-40.3) 

41.1  (37.4-44.9) 

35.1  (29.1-41.2) 

26.9  (1 3.4-40.3) 

Smoker  (n) 

Never 

63.6%  (77) 

57.3%  (43) 

85.2%  (23) 

75.0%  (6) 

Previously 

24.8%  (30) 

40%  (30) 

11.1%  (3) 

12.5%  (1) 

Current 

10.7%  (13) 

2.7%  (2) 

0%  (0) 

12.5%  (1) 

Unknown 

0.8%  (1) 

0%  (0) 

3.7%  (1) 

0%  (0) 

Sample 

Stool  (n) 

51.2%  (62) 

64%  (48) 

66.7%  (18) 

100%  (8) 

Biopsy  (n) 

48.8%  (59) 

36%  (27) 

33.3%  (9) 

0%  (0) 

Disease 

Active  disease  (n)^ 

26.4%  (32) 

29.3%  (22) 

0%  (0) 

0%  (0) 

Ileal  (n) 

35.5%  (43) 

NA 

NA 

NA 

Treatment 

Mesalamine  (n) 

55.4%  (67) 

77.3%  (58) 

0%  (0) 

75.0%  (6) 

Steroids  (n) 

31.4%  (38) 

37.3%  (28) 

0%  (0) 

50%  (4) 

Immunosuppressant  (n)'^ 

38.8%  (47) 

16%  (12) 

0%  (0) 

0%  (0) 

Antibiotics  (n) 

12.4%  (15) 

13.3%  (10) 

0%  (0) 

12.5%  (1) 

^Active  disease  defined  by  a  Harvey-Bradshaw  Index  (HBI)  >  5  or  Pediatric  Crohn's  Disease  Activity  Index  (pCDA!)  >  10  for  Crohn's  disease  (CD),  and  Simple 
Clinical  Colitis  Activity  Index  >  5  or  Pediatric  Ulcerative  Colitis  Activity  Index  (pUCAl)  >  10  for  ulcerative  colitis  (UC).  ^Immunosuppressant  treatments  include 
thiopurines,  methotrexate,  and  anti-tumor  necrosis  factor-a  antibody.  HS,  healthy  subjects;  NA,  not  applicable. 


using  boosting,  and  the  significance  of  these  associations 
was  then  assessed  using  a  multivariate  linear  model  with 
false  discovery  rate  correction  (see  Materials  and  meth¬ 
ods).  We  first  investigated  the  resulting  association  of 
microbial  clades  with  IBD  and  with  features  of  our 
cohorts,  testing  all  available  metadata  and  clades  from  the 
genus  to  phylum  levels.  Ordination  of  overall  relationships 
among  samples  and  host  status  revealed  several  major 
combinations  of  environmental  factors  that  co-varied  with 
the  microbiome  (Figure  1;  Additional  file  1).  For  example, 
UC  covaried  in  this  population  with  mesalamine  treat¬ 
ment,  whereas  CD  patients  were  more  often  assessed  by 
biopsy,  treated  with  immunosuppressants,  and  enriched 
for  Escherichia.  Similarity  among  microbiome  composi¬ 
tions  in  disease  subtypes  reflects  those  previously  observed 
[34,35],  with  ileal  CD  (iCD)  representing  a  strong  out¬ 
group,  UC  a  generally  less-extreme  microbial  phenotype 
(less  dissimilar  from  healthy  subjects),  and  non-iCD  a 
broad  distribution  of  microbiome  configurations. 

An  important  consideration  that  informed  the  remainder 
of  our  analysis,  and  which  is  often  overlooked  in  studies  of 
the  microbiome,  was  the  consistent  covariation  among  dis¬ 
ease  status,  aspects  of  subject  environment,  and  micro¬ 
biome  structure.  For  example,  the  factor  most  associated 
with  changes  in  microbiome  composition  was  not  disease 
but  whether  the  sample  origin  was  stool  or  biopsy.  Biopsy 
location  induced  minor  changes  in  microbiome  composi¬ 
tion  (Additional  files  2,  3  and  4)  relative  to  the  extreme 
differences  between  stool  and  biopsy  communities,  in 


agreement  with  previous  studies  [36,37].  In  this  cohort, 
iCD  was  always  represented  by  biopsy,  whereas  18.4%  of 
non-iCD  and  36%  of  UC  samples  were  biopsies.  iCD  was 
also  associated  with  greater  likelihood  of  immunosuppres¬ 
sant  treatment:  iCD,  non-iCD,  and  UC  patients  were  trea¬ 
ted  by  immunosuppressants  in  74.4%,  19.2%  and  16%  of 
samples,  respectively.  In  contrast,  non-iCD  and  UC  cases 
were  more  likely  to  be  treated  with  mesalamine  or  antibio¬ 
tics:  mesalamine  was  used  for  30.2%  of  iCD  samples,  69.2% 
of  non-iCD  samples,  and  77.3%  of  UC  samples,  while  anti¬ 
biotics  were  used  in  2.3%  of  iCD,  17.9%  of  non-iCD,  and 
13.3%  of  UC  samples.  These  associations  lead  to  a  range  of 
non-independent  covariates.  Although  disease  activity  may 
influence  microbiome  composition,  after  adjusting  for  the 
other  factors,  it  was  not  independently  associated  with  a 
specific  shift  in  the  microbiome  composition  in  our  analy¬ 
sis,  and  there  were  no  significant  (P  <  0.01)  associations 
between  microbiome  composition  and  gender  (Additional 
file  5). 

The  second  largely  independent  factor  influencing 
microbiome  composition  was  age,  itself  negatively  asso¬ 
ciated  with  smoking  (Figure  1;  Additional  file  1).  Twenty- 
four  (10.4%)  of  the  available  subjects  were  less  than  18 
years  of  age  and  26  were  60  years  or  older.  Aging  is  asso¬ 
ciated  with  continual  changes  in  the  microbiome,  primar¬ 
ily  a  gradual  decrease  in  Bifidobacterium  as  observed  here 
(Additional  file  6)  and  by  others  [38,39].  After  observing 
these  overall  patterns  of  covariation  among  disease,  treat¬ 
ment,  environment,  and  gut  microbiome  composition,  we 
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Figure  1  Covariation  of  microbiai  community  structure  in  iBD  with  treatment,  environment,  biometrics,  and  disease  subtype.  Fecal  and 
biopsy  samples  from  228  IBD  patients  and  healthy  controls  are  plotted  as  squares  (Ileal  CD)  or  circles  (not  ileal  Involved)  and  colored  by  disease 
status.  Axes  show  the  first  two  components  of  overall  variation  as  determined  by  multiple  factor  analysis  (see  Materials  and  methods). 
Covariation  In  the  presence  of  clinical  factors  (bold)  and  In  microbial  taxa  (Italic)  Is  shown.  Sample  origin  (biopsy  versus  stool)  Is  the  single  most 
influential  factor  In  determining  microbial  community  structure,  accompanied  by  host  age,  treatment  types,  and  disease  (particularly  Ileal  CD). 


continued  our  analysis  only  after  assessing  the  significance 
of  microbiome-disease  associations  in  a  multivariate  man¬ 
ner  to  account  for  host  environment  and  treatment. 

Microbial  clades  differentially  abundant  specifically  in  iBD 
include  Roseburia,  the  Ruminococcaceae,  and  the 
Enterobacteriaceae 

After  adjusting  for  these  covariates,  we  determined  micro¬ 
bial  clades  differing  significantly  in  abundance  between 
healthy  and  IBD  subjects  (Figure  2a;  Additional  file  1).  This 
considered  age,  smoking,  and  treatment  factors  (immuno¬ 
suppressant,  corticosteroids,  mesalamine,  antibiotics),  as 
well  as  disease  activity  at  sampling  and  sample  type  (stool 
or  biopsy).  Two  genus-level  phylotypes,  Roseburia  and 
Phascolarctobacterium,  were  significantly  reduced  in  both 
UC  and  CD,  while  Clostridium  increased,  all  with  false  dis¬ 
covery  rate  q  <  0.2.  Roseburia  is  a  clade  XlVa  Clostridia 
and  thus  associated  with  anti-inflammatory  regulatory 
T  cell  production  in  the  gut  [40].  Cultured  Roseburia  have 
been  described  as  acetate  utilizers  and  butyrate  producers 
[41],  while  cultured  Phascolarctobacterium  are  exclusively 


succinate  consumers,  and  produce  propionate  when  co¬ 
cultured  with  Paraprevotella  [42].  Thus,  an  IBD-associated 
decrease  in  Roseburia  and  Phascolarctobacterium  may 
reflect  a  decrease  in  butyrate  and  propionate  production. 

The  Ruminococcaceae,  which  are  acetate  producers  [43], 
were  decreased  in  CD,  while  the  Leuconostocaceae,  which 
produce  acetate  and  lactate  [44],  were  decreased  in  UC. 
The  only  major  clade  with  a  significant  increase  in  abun¬ 
dance  specific  to  CD  was  the  Enterobacteriaceae,  specifi¬ 
cally  Escherichia/Shigella.  This  family  has  been  previously 
implicated  in  intestinal  inflammation  [6,45-47]. 

Crohn's  disease  with  ileal  involvement  presents  a  distinct 
microbiome  phenotype  including  reduced 
Faecalibacterium,  and  Odoribacter  is  reduced  both  in 
iCD  and  in  pancolonic  UC 

In  CD  patients  with  ileal  involvement,  sequences  of  the 
Ruminococcaceae  family  and  of  Faecalibacterium  in  parti¬ 
cular  were  dramatically  reduced  compared  to  other  subjects 
(Figure  2a),  confirming  previous  studies  [48,49].  Faecalibac¬ 
terium  prausnitzii,  the  only  cultured  representative  of 
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(a)  Taxonomic  distribution  of  organisms 

associated  with  disease 
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2:  Enterobacteriaceae  (unclassified) 
3:  Ciostridiaceae  (unclassified) 
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7:  Phascolarctobacterium 
8;  Veiiloneiiaceae  (unclassified) 

9:  Ciostridiaceae 
10:  Ruminococcaceae 


Crohn's  disease  and  uicerative  colitis 
Crohn’s  disease  and  iieai  invoivement 
Crohn’s  disease 
Uicerative  colitis  | 
iieai  invoived  [ 
Not  significant  [ 


FU! 


Genera 


(b)  Association  of  community  ecology  with 
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Figure  2  Significant  associations  of  microbial  clade  abundance  and  community  ecology  with  IBD  and  treatment,  (a)  Taxonomic 
distribution  of  clades  significant  to  disease  and  iieai  involvement.  Abundant  clades  not  significantly  associated  with  IBD  are  annotated  in  gray 
for  context  (top  90th  percentile  of  at  least  10%  of  samples  and  including  5+  genera).  Node  (non-associated  clade)  sizes  are  proportional  to  the 
log  of  the  clade's  average  abundance,  (b)  Significance  of  association  of  sample  ecology  with  disease  (CD/UC,  ileal/pancolonic),  treatment 
(antibiotics,  immunosuppression,  mesalamine,  steroids),  and  environment  (smoking,  stool/biopsy  sample  origin).  Diversity  (Simpson’s  index), 
evenness  (Pielou's  index),  and  richness  (Chaol)  were  calculated  for  each  community  (see  Materials  and  methods).  False  discovery  rate  q-values 
are  -logio  transformed  for  visualization,  such  that  values  >  0.60  correspond  to  q  <  0.25.  Antibiotic  treatment  is  strongly  associated  with  reduced 
diversity,  and  stool  samples  with  increased  diversity  relative  to  biopsies. 


Faecaiibacterium,  is  able  to  metabolize  both  diet-derived 
polysaccharides  and  host-derived  substrates  such  as 
N-acetyl  glucosamine  from  intestinal  mucus  [50].  It  is  also 
a  major  butyrate  producer  and  exhibits  anti-inflammatory 
effects  in  a  colitis  setting  [51].  The  Ruminococcaceae  repre¬ 
sent  the  first  step  of  microhiome-linked  carbohydrate  meta¬ 
bolism,  as  they  degrade  several  types  of  polysaccharides  in 
the  lower  GI  tract,  including  starch,  cellulose,  and  xylan 
[21].  The  Roseburia  genus,  which  is  significantly  reduced  in 
all  IBD  patients  (including  iCD),  and  the  Ruminococcaceae 
are  further  functionally  connected  in  that  the  latter  con¬ 
sume  hydrogen  and  produce  acetate  that  can  be  utilized  by 
Roseburia  to  produce  butyrate  [41,43].  Consistent  reduc¬ 
tions  in  all  of  these  clades  may  thus  have  functional  conse¬ 
quences  on  the  ability  of  the  host  to  repair  the  epithelium 
and  to  regulate  inflammation. 

The  genera  Escherichia/Shigella  (indistinguishable  as  a 
16S-based  phylotype)  were  particularly  highly  enriched 
in  iCD  (q  <  0.2;  Additional  file  1)  above  their  general 
overabundance  in  CD  patients.  Lipopolysaccharide  pro¬ 
duced  by  Cram-negative  bacteria  such  as  Escherichia 
coli  is  a  canonical  microbe-associated  molecular  pattern, 
known  to  activate  toll-like  receptor  4  (TLR4)  signaling 
[52]  and  thus  trigger  inflammatory  cascades.  TLR4 


expression  is  highly  up-regulated  in  the  intestinal 
epithelium  of  IBD  patients  [53],  and  mutations  in  TLR4 
are  associated  with  both  CD  and  UC  [54].  Previous  cul¬ 
ture-based  studies  have  found  that  E.  coli,  specifically  E. 
coli  exhibiting  pathogen-like  behaviors  such  as  adhesion 
and  invasiveness  [55],  are  more  frequently  cultured  from 
iCD  biopsies,  and  culture-independent  studies  have 
found  an  enrichment  in  E.  coli  that  contain  virulence- 
associated  genes  in  iCD  [6].  This  suggests  that  CD- 
involved  ileum  is  a  favorable  milieu  for  establishment  of 
E.  coli  with  pathobiont  features,  which  may  have  impli¬ 
cations  for  IBD  exacerbations  and  its  chronicity.  An 
inflamed  ileum  may  furnish  a  specialized  niche  permis¬ 
sive  for  microbes  with  enhanced  fitness  in  inflamed 
conditions. 

The  most  severe  form  of  UC  is  pancolitis,  in  which 
UC  affects  the  entire  colon;  this  condition  is  associated 
with  greatly  increased  risk  of  colon  cancer  [56].  Patients 
with  pancolitis  did  not  harbor  a  clear  specificity  in  their 
dysbiosis.  However,  both  these  patients  and  iCD  patients 
had  a  reduced  abundance  of  the  Odoribacter  genus, 
which  belongs  to  the  Porphyromonadaceae  family  and 
to  the  Bacteroidetes  phylum.  As  Odoribacter  splanchnus 
is  a  known  producer  of  acetate,  propionate,  and  butyrate 
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[57],  decreased  Odoribacter  may  affect  host  inflamma¬ 
tion  via  reduced  SCFA  availability. 

Microbiome  composition  is  also  strongly  associated  with 
subject  age,  treatment,  smoking,  and  sample 
biogeography 

In  the  process  of  identifying  microbiome  perturbations 
specific  to  IBD,  our  multivariate  model  simultaneously 
analyzed  the  surprisingly  diverse  effects  of  environmen¬ 
tal  and  treatment  factors  on  G1  microbial  communities 
(see  selection  in  Figure  3;  complete  data  in  Additional 
file  1).  We  observed  a  significant  correlation  between 
increasing  age  and  decreasing  Bifidobacterium  (Addi¬ 
tional  file  6).  The  Firmicutes  phylum  also  significantly 
decreased  while  Bacteroides  increased  with  age  in  this 
cohort  (Additional  file  1);  this  agrees  with  previous  stu¬ 
dies  [38,39]  and  potentially  reflects  dietary  or  body 
mass-related  changes  with  increasing  age,  which  were 


not  directly  measured  in  these  subjects,  or  host  metabo¬ 
lism  modifications  [58]. 

Critical  to  determining  causality  in  links  between  IBD 
and  the  gut  microbiome,  IBD  treatments  were  also  asso¬ 
ciated  with  alterations  in  microbiome  composition. 
Mesalamine  (5-aminosalicylic  acid)  is  a  bowel-specific 
aminosalicylate  drug.  Although  its  exact  mode  of  action 
is  unknown,  it  is  thought  to  act  as  an  antioxidant  and  to 
decrease  intestinal  inflammation,  in  part  by  peroxisome 
proliferator-activated  receptor-y  (PPARy)  activation  and 
inhibition  of  NFkB  and  pro-inflammatory  eicosanoid 
production.  Here,  its  use  was  linked  to  strong  reduc¬ 
tions  in  Escherichia/Shigella  (>  100%  of  average  abun¬ 
dance,  q  <  0.04;  Additional  file  7),  in  agreement  with  a 
recent  study  [59].  Both  5-aminosalicylic  acid  and  immu¬ 
nosuppressant  treatment  were  associated  with  modest 
increases  in  Enterococcus,  the  only  genus  perturbed  in 
immunosuppressant-treated  patients  with  low  false 


Taxa  associated  with  host  environment 

Smoking  Antibiotics  use 


Non-smoker  Smoker  No-Abx  Abx  No-Abx  Abx  No-Abx  Abx 

Anaerostipes  Dorea  Butyricicoccus  Coriobacteriaceae 

Figure  3  Select  microbial  clades  significantly  linked  to  host  environment  and  treatment.  Anaerostipes  decreased  significantly  in  the  gut 

communities  of  smokers,  and  Dorea,  Butyricicoccus,  and  Coriobacteriaceae  were  among  the  taxa  most  reduced  in  patients  receiving  antibiotics 

(Abx).  These  associations  were  significant  even  in  a  multivariate  model  accounting  for  sample  biogeography  and  disease  status.  Sqrt,  square  root. 
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discovery  rate  (also  >  100%  of  average  abundance,  q  < 
0.09). 

Antibiotics  were  among  the  strongest  factors  associated 
with  a  reduction  in  ecological  diversity  (Figure  2b).  Many 
individual  clades  were  greatly  reduced  or  nearly  absent 
after  administration  of  antibiotics,  including  the  Collin- 
sella,  Dorea,  Butyricicoccus,  Subdoligranulum,  and  Aceti- 
vibrio  (all  q  <  0.2;  Additional  file  1).  These  genera  are 
predominantly  from  the  Clostridiales  order.  Gram-posi¬ 
tive  and  anaerobic  bacteria  that  are  targeted  by  the  anti¬ 
biotics  commonly  used  in  IBD,  such  as  ciprofloxacin  and 
metronidazole. 

Smoking  is  likely  the  best-known  environmental  factor 
that  impacts  IBD  [60] .  It  is  associated  with  increased  risk 
of  CD  and  is  conversely  protective  towards  developing 
UC  [61].  The  only  common  organism  to  which  tobacco 
usage  was  linked  in  these  individuals  was  Anaerostipes 
(Firmicutes  phylum),  which  decreased  (>  60%  average 
abundance,  q  <  0.15;  Figure  3)  in  current  or  former 
tobacco  users,  beyond  any  change  due  solely  to  smokers’ 
higher  average  age.  The  Anaerostipes  genera  can  utilize 
lactate  to  produce  butyrate  [62],  which  is  beneficial  to 
colonic  health. 

Finally,  as  previously  mentioned,  samples  of  the  stool  as 
opposed  to  mucosal  biopsies  differed  strongly  in  micro- 
biome  composition  (Additional  file  2).  More  than  70 
clades  were  significantly  over-  or  under-enriched  in  stool 
samples  relative  to  biopsies  at  q  <  0.2.  This  effect  extended 
to  entire  phyla,  as  the  Firmicutes  were  approximately  two¬ 
fold  more  abundant  in  stool  (Additional  file  1).  Microbial 
habitat  dictates  the  composition  of  microbial  communities 
[36];  in  the  GI  tract,  this  has  been  suggested  to  occur  on 
biogeographical  scales  of  intestinal  regions  [37,63]  or  even 
millimeters  apart  [64,65],  and  luminal/mucosal  differences 
may  be  further  perturbed  by  bowel  preparation  prior  to 
colonoscopy  [66] .  The  data  did  not  suggest  that  the  lumi¬ 
nal  and  mucosal  communities  were  independent;  rather, 
all  14  clades  significantly  associated  with  IBD  retained  the 
same  trend  when  stratified  by  sample  origin  (Additional 
file  8).  The  fecal  microbiome  appeared  to  convey  a  consis¬ 
tent  but  numerically  transformed  function  of  mucosal 
communities,  both  of  which  shifted  in  composition  in 
association  with  host  environment,  treatment,  and  disease. 

In  a  closer  analysis  of  intestinal  biogeography  as 
reflected  by  biopsies  drawn  from  distinct  regions,  differ¬ 
ences  in  most  clades  were  modest  and  correlated  largely 
with  previously  described  changes  in  pH  (Additional  files 
2,  3  and  9)  [67].  The  clades  with  the  largest  regional 
changes  included  the  Roseburia  and  Ruminococcaceae, 
with  lower  abundance  in  the  low-pH  terminal  ileum, 
transverse,  and  right  colon;  Alistipes,  following  a  similar 
pattern;  and  the  Fusobacteria  and  Enterobacteriaceae,  with 
an  opposite  pattern  of  somewhat  increased  abundance  in 


the  ileum  and  right  colon.  Particularly  as  the  former  have 
also  been  associated  with  the  colorectal  cancer  microenvir¬ 
onment  in  previous  work  [68,69],  it  is  of  note  that  these 
variations  in  the  microbiota  with  respect  to  biogeography 
and  pH  are  similar  to  those  we  observed  with  respect  to 
IBD  and  potentially  redox  status  as  detailed  below. 

The  metagenomic  abundances  of  microbial  metabolic 
pathways  are  more  consistently  perturbed  in  IBD  than 
are  organismal  abundances 

We  continued  our  analysis  by  combining  community  com¬ 
position  with  over  1,200  annotated  genomes  from  the 
Kyoto  Encyclopedia  of  Genes  and  Genomes  (KEGG)  cata¬ 
log  [70].  The  genes  annotated  within  each  available  refer¬ 
ence  genome  were  used  to  provide  an  approximate  gene 
catalog  for  each  community  (see  Materials  and  methods), 
which  we  reconstructed  into  metabolic  pathways  (Figure  4) 
and  smaller  modules  and  biological  processes  (Figure  5; 
Additional  file  10)  as  previously  described  [71].  Pathway, 
module,  and  process  abundances  were  then  associated  with 
disease  and  host  environment  using  the  same  sparse  multi¬ 
variate  model  with  which  microbial  abundances  were 
assessed  (Additional  files  11, 12  and  13). 

Considering  only  the  contrast  between  IBD  (CD  or 
UC)  and  healthy  subjects,  24  of  200  (12%)  total  meta¬ 
bolic  modules  were  differentially  abundant  at  q  <  0.2. 
This  is  in  stark  contrast  to  the  microbial  shifts  dis¬ 
cussed  above,  in  which  only  6  of  263  (2%)  genus-level 
clades  reached  this  significance  threshold.  Even  in  the 
absence  of  metagenomic  or  metatranscriptomic  data 
and  only  leveraging  the  genes  and  pathways  in  refer¬ 
ence  genomes  associated  with  these  communities, 
changes  in  microbial  function  were  more  consistent 
than  changes  in  community  structure.  This  has  been 
noted  in  environmental  communities  [72]  and  sug¬ 
gested  with  respect  to  obesity  and  other  biometrics 
[73,74],  but  to  date  it  has  not  been  reported  for  dis¬ 
ease-linked  dysbioses  or  IBD. 

We  validated  these  functional  shifts  by  shotgun  meta¬ 
genomic  sequencing  of  the  small  subset  of  available  sam¬ 
ples  with  appropriate  stool  DNA,  seven  healthy  controls 
and  four  CD  patients  (Additional  file  14).  These  were 
sequenced  to  a  shallow  depth  averaging  119  meganucleo¬ 
tides  per  sample  of  150-nucleotide  paired-end  Illumina 
MiSeq  reads,  reducing  our  effective  limit  of  detection  but 
otherwise  providing  close  agreement  with  inferred  meta¬ 
bolic  shifts  in  the  IBD  metagenome.  Of  the  modules 
highlighted  below  and  in  Figure  5,  one  (cobalamin  bio¬ 
synthesis)  fell  below  the  limit  of  detection,  and  the 
remaining  six  retained  the  expected  trend  of  over-  or 
under-enrichment  in  Crohn’s  disease,  as  did  additional 
processes  detailed  below,  including  glycolysis  and  bacter¬ 
ial  secretion. 
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Figure  4  Microbial  metabolic  pathways  with  significantly  altered  abundances  in  the  gut  communities  of  IBD  patients.  Abundance  of 
KEGG  metabolic  pathways  in  microbiome  samples  is  colored  by  disease  state  and,  when  significant,  stratified  by  ileal  involvement.  Basic 
metabolism  (for  example,  most  amino  acid  biosynthesis)  and  SGFA  production  were  reduced  in  abundance  in  disease,  while  biosynthesis  and 
transport  of  compounds  advantageous  for  oxidative  stress  (for  example,  sulfur,  cysteine,  riboflavin)  and  adherence/pathogenesis  (for  example, 
secretion)  were  increased. 
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Amino  acid  biosynthesis  and  carbohydrate  metaboiism 
are  reduced  in  the  iBD  microbiome  in  favor  of  nutrient 
uptake 

We  observed  that  even  basic  GI  microbiome  metabo¬ 
lism  was  altered  in  both  UC  and  CD.  Amino  acid 
metabolism  showed  major  perturbation:  genes  for  the 
metabolism  and  biosynthesis  of  nearly  all  amino  acids 
(particularly  histidine  and  lysine)  decreased  in  abun¬ 
dance  (Figure  4),  while  arginine,  histidine,  and  lysine 
transport  (Figure  5)  gene  abundance  increased.  In  iCD 
we  also  observed  a  decrease  in  glutamine-related  func¬ 
tional  modules,  which  would  lead  to  a  lower  amount  of 
glutamate  required  for  gamma-aminobutyric  acid, 
ornithine,  and  arginine  biosynthesis;  abundance  of  all 
three  of  these  modules  also  decreased.  In  marked 


contrast  to  the  other  amino  acids,  genes  for  metabo¬ 
lism  of  the  sulfur-containing  amino  acid  cysteine  sig¬ 
nificantly  increased  in  abundance,  with  even  greater 
increase  in  iCD.  This  corresponded  with  an  overrepre¬ 
sentation  of  genes  related  to  sulfate  transport  in  UC 
and  CD  (Figure  5),  and  in  increase  in  sulfur  and  nitro¬ 
gen  metabolism  in  CD  (Figure  4). 

CD  was  associated  with  increased  abundance  of  many 
genes  related  to  carbohydrate  transport  (Figure  5).  There 
were  large  increases  in  pentose  phosphate  pathway  and 
fructose/mannose  metabolism  gene  abundance  in  iCD 
(Figure  4),  which  were  accompanied  by  increase  in  carbo¬ 
hydrate  metabolism,  but  they  were  not  significant  in  UC 
and  CD.  In  addition,  iCD  showed  increased  abundance  of 
transporter  genes  for  glucose,  hexoses,  maltose,  and 
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Figure  5  Small  metabolic  modules  and  biological  processes  with  significantly  altered  abundances  in  the  IBD  microbiome.  (a,  b)  Small 

(typically  5  to  20  gene)  KEGG  modules  (a)  and  independently  defined  biological  processes  from  the  Gene  Ontology  (b)  were  assessed  for 
significant  association  with  disease  and  ileai  involvement  as  in  Figure  4.  Metaboiism  related  to  oxidative  stress  (for  exampie,  giutathione  and 
sulfate  transport)  and  for  pathobiont-like  auxotrophy  (for  example,  N-acetylgalactosamine  and  amino  acid  uptake)  is  increased,  while  several 
basic  biosynthetic  processes  are  less  abundant. 
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mono-,  di-,  and  oligosaccharides  (Figure  5).  We  observed 
a  decrease  in  both  butanoate  and  propanoate  metabolism 
in  iCD  (Figure  4),  suggesting  a  potential  decrease  in 
SCFA  production  by  the  microbiome,  possibly  due  to  the 
observed  decrease  in  Roseburia  and  Faecalibacterium. 

We  saw  an  increase  in  glutathione  transport  gene  abun¬ 
dance  in  UC  and  CD  (Figure  5)  and  an  increase  in  glu¬ 
tathione  metabolism  gene  abundance  in  UC.  Glutathione 
is  a  tripeptide  of  cysteine  and  glutamate,  synthesized  by 
Proteobacteria  and  a  few  streptococci  and  enterococci 


[75],  which  allows  bacteria  to  maintain  homeostasis  during 
oxidative  or  acid  stress.  Inflammatory  cascades  include 
production  of  highly  reactive  oxygen  and  nitrogen  meta¬ 
bolites,  which  are  greatly  increased  in  active  IBD  [76]. 
Lamina  propia  monocytes  also  release  homocysteine  dur¬ 
ing  inflammation,  which  further  contributes  to  oxidative 
stress;  IBD  is  associated  with  higher  levels  of  both  mucosal 
and  serum  homocysteine  [77].  Thus,  the  increases  in 
sulfate  transport,  cysteine  metabolism,  and  glutathione 
metabolism  may  reflect  a  mechanism  by  which  the  gut 
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microbiome  addresses  the  oxidative  stress  caused  by 
inflammation. 

Extreme  functional  shifts  in  iCD  include  changes  in  redox 
metabolism,  enrichment  of  signaling/secretion,  and 
suggest  a  'pathobiont-like'  invasive  metagenome 

CD  with  ileal  involvement  exhibited  specific  dysfunction 
at  the  module  level.  It  was  associated  with  an  increase  in 
several  modules  involved  in  glycolysis  and  carbohydrate 
transport  and  metabolism  (Figure  5).  Conversely,  iCD 
exhibited  lower  abundance  of  genes  involved  in  lipid 
metabolism  and  catabolism,  confirming  a  major  imbal¬ 
ance  in  energy  metabolism.  We  observed  a  global 
decrease  in  nicotinamide,  purine,  and  pyrimidine  nucleo¬ 
tide  biosynthesis  modules  in  iCD,  CD,  and  UC  (Figure  5). 

There  was  a  decrease  in  vitamin  biosynthesis  associated 
with  iCD,  but  increases  in  thiamine  and  particularly  ribo¬ 
flavin  metabolism  modules  (Figure  4).  Interestingly,  this 
pathway  is  fed  by  the  pentose  phosphate  pathway,  which 
was  also  overrepresented  in  iCD.  Riboflavin  is  necessary 
for  regenerating  oxidized  glutathione  back  to  its  reduced 
form,  and  is  thus  essential  for  pH  and  oxidative  stress 
homeostasis,  as  is  NADPH,  a  product  of  the  pentose 
phosphate  pathway.  Metabolism  of  the  sulfur-containing 
amino  acids  cysteine  and  methionine  was  increased  in 
iCD,  in  marked  contrast  to  the  IBD-associated  decreases 
in  the  non-sulfur-containing  amino  acids  such  as  lysine 
and  glutamine.  As  homocysteine  is  easily  convertible  to 
methionine,  this  may  indicate  a  further  mechanism  of 
maintaining  redox  homeostasis.  Alternatively,  this  may 
be  connected  to  the  iCD-specific  increase  in  carbohy¬ 
drate  metabolism,  as  cysteine  may  be  metabolized  to 
pyruvate. 

Finally,  genes  involved  in  pathogenesis  processes,  such 
as  secretion  systems  and  adherence/invasion,  were  over¬ 
represented  in  iCD  (Figure  4).  For  example,  genes 
involved  in  the  shigellosis  pathway  were  more  abundant 
in  CD,  and  type  II  secretion  genes  were  more  abundant 
in  iCD.  Type  II  secretion  is  involved  in  the  secretion  of 
cell  wall-degrading  enzymes  [78]  and  the  secretion  of 
toxins  such  as  heat-labile  enterotoxin,  similar  to  cholera 
toxin  [79].  These  functions  are  typical  of  pathobiont 
adherent-invasive  E.  coli,  which  have  been  observed  to 
increase  in  iCD  in  our  own  study  and  others  [6,55].  This 
may  be  associated  with  tissue  damage,  either  primarily  as 
a  result  of  toxin  secretion,  or  secondarily  as  a  result  of 
stimulated  cytokine  production.  This  tissue  destruction  is 
a  likely  source  of  metabolites  for  microbial  overgrowth, 
selecting  for  auxotrophic  specialists  able  to  thrive  in  this 
environment  and  resulting  in  the  microbiome-wide  loss 
of  basic  biosynthetic  processes  (Figures  4  and  5).  This 
would  in  turn  lead  to  further  tissue  breakdown,  bacterial 
overgrowth,  and  community  structural  and  functional 
dysbiosis. 


Discussion 

The  GI  microbiome  influences  dietary  energy  extraction, 
immune  system  development,  vitamin  production,  and 
drug  metabolism,  yet  most  molecular  and  metabolic 
functions  of  the  bacteria  of  the  GI  microbiome  are 
uncharacterized  [20].  To  gain  insight  into  the  functional 
consequences  of  IBD-associated  dysbiosis,  we  used  a 
novel  approach  pairing  microbial  community  16S  gene 
sequence  profiles  with  information  from  the  closest  avail¬ 
able  whole-genome  sequences.  This  defined  an  inferred 
metagenome  and  thus  complement  of  metabolic  func¬ 
tional  modules  for  each  microbiome  in  this  study.  This 
allowed  us  to  identify  unique  functional  perturbations  in 
the  microbiomes  of  IBD  patients.  Interestingly,  although 
we  identified  only  nine  changes  in  bacterial  clades  that 
associated  with  UC  (of  350  total,  2.6%),  we  identified  21 
statistically  significant  differences  in  functional  pathways 
and  metabolic  modules  (of  295,  7.1%);  this  pattern  held 
for  CD  and  iCD  function  as  well.  This  underscores  the 
fact  that  phylogenetically  diverse  changes  in  the  composi¬ 
tion  of  the  GI  microbiome  can  be  functionally  coordi¬ 
nated  and  lead  to  major  modifications  in  the  metabolic 
potential  of  the  microbiota. 

The  microbial  metabolic  information  available  in  this 
study  represents  only  one  step  in  the  functional  investiga¬ 
tion  of  the  IBD  microbiota,  as  it  is  an  accurate  but  approx¬ 
imate  inference  using  prior  knowledge  of  microbial 
genomes.  The  metagenomes  inferred  from  our  16S  data 
were  supported  by  shotgun  sequencing  of  a  subset  of  sam¬ 
ples,  providing  one  confirmation  that  they  were  represen¬ 
tative  of  community  functional  capability.  As  sequencing 
costs  continue  to  fall,  rich  metagenomic  data  for  dozens 
or  hundreds  of  samples  will  further  improve  our  ability  to 
resolve  species-level  gene  function  in  communities.  Of 
course,  a  community  expresses  only  a  variable  subset  of  its 
functional  capability  at  any  given  time,  in  response  to 
environmental  stimuli.  Thus,  metatranscriptomic,  proteo- 
mic,  and  metabolomic  data  will  continue  to  add  to  our 
understanding  of  which  of  a  community’s  potential  func¬ 
tions  are  most  strongly  affecting  the  host  during  inflam¬ 
matory  disease. 

Combining  shifts  in  functional  module  abundance  with 
prior  knowledge  of  these  metabolic  pathways  provides 
fresh  insight  into  microbiome  dysfunction  in  IBD.  Meta¬ 
bolism  of  the  sulfur-containing  amino  acid  cysteine  was 
increased  in  both  UC  and  CD.  This  was  accompanied  by 
increases  in  riboflavin  metabolism,  glutathione  transpor¬ 
ters,  and  the  N-acetylgalactosamine  phosphotransferase 
system.  Mucin,  which  is  rich  in  cysteine  and  glycosylated 
sugars,  is  abundant  in  the  intestinal  epithelium,  and  it  is 
upregulated  during  inflammation.  The  increases  in 
cysteine  metabolism  and  N-acetylgalactosamine  trans¬ 
porters  may  reflect  a  shift  in  the  microbiome  towards 
greater  abundance  of  microbes  that  use  mucin  as  a 
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primary  energy  source  (Figure  6).  This  functionality  sug¬ 
gests  activity  at  the  mucosa  and  this  may  be  problematic 
for  a  damaged  IBD  epithelium  with  compromised  barrier 
function. 

Alternatively,  the  increased  biosynthesis  of  cysteine  (a 
precursor  of  glutathione)  and  of  glutathione  transport 
modules  may  speak  to  the  microbiome’s  response  to  the 
oxidative  stress  (high  levels  of  reactive  oxygen  and  nitro¬ 
gen  species)  of  the  inflamed  IBD  gut  [76].  In  support  of 
this  concept,  we  found  that  riboflavin  metabolism,  which 
is  required  to  convert  glutathione  between  its  oxidized 
and  reduced  forms,  is  increased  in  iCD.  Furthermore,  the 
pentose  phosphate  pathway,  which  produces  the  NADPH 
also  required  for  glutathione  reduction,  is  increased  as 
well.  Recent  studies  have  shown  that  redox  stress  allows 
Salmonella  to  use  ethanolamine  as  a  carbon  source  [80] 
and  allows  enterohemorrhagic  E.  coll  to  use  it  as  a  nitro¬ 
gen  source  [81],  thus  conferring  a  competitive  advantage 
to  these  microbes.  This  raises  the  interesting  possibility 
that  E.  coli  or  related  species  in  IBD  may  be  highly  repre¬ 
sented  because  they  gain  a  competitive  advantage  from 
oxidative  stress  and  are  better  able  to  compensate  for  it 
with  glutathione  production. 

In  both  UC  and  CD,  there  were  decreases  in  the  bio¬ 
synthesis  of  lysine,  arginine,  and  histidine  in  favor  of  trans¬ 
port  in  both  UC  and  CD;  a  further  decrease  in  tryptophan 
metabolism  was  associated  with  iCD.  The  data  showed 
additional  broad  decreases  in  many  essential  processes, 
such  as  cobalamin  synthesis,  purine  and  pyrimidine  bio- 
sythesis,  lipid  catabolism,  and  phospholipid  metabolism,  as 
well  as  marked  increases  in  transport.  This  overall 
decrease  in  abundance  of  genes  for  amino  acid  and 
nucleotide  biosynthesis  bears  striking  resemblance  to  the 
lifestyle  of  highly  symbiotic  bacteria  that  are  intrinsically 
auxotropic  and  also  of  some  pathobionts  (Figure  6).  One 
such  example  are  segmented  filamentous  bacteria  (SFB),  a 
symbiont  that  belongs  to  the  Candidadatus  Arthromitis,  a 
sub-group  of  clade  I  (sensu  stricto)  Clostridia.  A  recently 
sequenced  SFB  genome  lacked  genes  for  nucleotide  bio¬ 
synthesis  as  well  as  nearly  all  vitamins  and  amino  acids 
[82,83].  SFB  are  often  abundant  in  the  rodent  terminal 
ileum  and  are  responsible  for  the  maturation  of  Thl7  cells 
[84],  which  play  an  important  role  in  CD-associated 
inflammation  [85].  To  date,  neither  SFB  nor  phylogeneti- 
cally  related  sequences  have  been  observed  in  humans 
[82,86];  this  was  also  true  in  our  data  (zero  16S  sequences 
with  >  90%  identity  to  X77814  SFB).  However,  a  functional 
trend  similar  to  SFB  was  observed  in  these  IBD  commu¬ 
nity  metagenomes,  as  biosynthetic  mechanisms  through¬ 
out  central  carbon  metabolism,  amino  acid  biosynthesis, 
and  nucleotide  maintenance  were  all  reduced  (Figures  4 
and  5),  hinting  that  humans  may  host  functional  equiva¬ 
lents  of  SFB-like  pathobionts  that  increase  in  IBD  but  are 
not  phylogenetically  close  to  Candidatus  Arthromitis. 


Host  tissue  destruction,  either  inflammation-mediated  or 
bacterially  mediated,  would  provide  a  ready  nutrient 
source  (Figure  6). 

Conclusions 

The  data  presented  here  show  that  IBD  and  iCD  in  parti¬ 
cular  are  associated  with  a  dysbiosis  characterized  by 
changes  in  Firmicutes  and  Proteobacteria  phyla.  Environ¬ 
mental  factors  and,  notably,  treatments  were  also  asso¬ 
ciated  with  independent  changes  in  the  GI  microbiome; 
these  must  be  taken  into  account  during  future  studies  of 
the  microbiota  in  IBD.  These  perturbations  in  bacterial 
composition,  although  modest,  were  associated  with  major 
perturbations  of  GI  microbiome  function,  which  revolved 
around  metabolism  in  the  presence  of  oxidative  stress  and 
perturbed  nutrient  availability  during  tissue  damage. 
Further  studies,  particularly  including  transcriptomic,  pro- 
teomic,  or  metabolomic  characterization,  longitudinal 
data,  and  dietary  metadata,  will  be  needed  to  additionally 
define  the  consequences  of  the  IBD-associated  micro¬ 
biome  dysfunction  on  the  host  and  the  specific  mechan¬ 
isms  by  which  they  are  carried  out  or  regulated  by  the 
microbiota. 

Materials  and  methods 

The  OSCCAR  and  PRISM  cohorts 

The  Ocean  State  Crohn’s  and  Colitis  Area  Registry 
(OSCCAR)  is  a  state-based,  prospective  inception  cohort 
of  IBD  patients  that  was  designed  to  study  the  epidemiol¬ 
ogy  of  IBD,  to  determine  the  incidence  of  IBD  in  Rhode 
Island,  and  to  extrapolate  these  rates  to  the  general  popu¬ 
lation  of  the  United  States.  The  diverse  population  of 
over  1  million,  limited  geographic  range,  and  well-cir¬ 
cumscribed  gastroenterology  community  of  Rhode  Island 
were  ideal  circumstances  for  establishing  a  prospective 
inception  cohort  of  IBD  patients.  All  but  one  of  the  98 
gastroenterologists/colorectal  surgeons  in  Rhode  Island 
agreed  to  refer  patients  to  OSCCAR,  and  11  gastroenter¬ 
ologists  practicing  in  Massachusetts  just  over  the  Rhode 
Island  border  also  agreed  to  refer  their  newly  diagnosed 
IBD  patients  who  resided  in  Rhode  Island.  Enrollment 
began  1  January  2008.  All  Rhode  Island  residents  with  a 
newly  confirmed  diagnosis  of  CD,  UC,  or  indeterminate 
colitis  were  eligible  for  inclusion  (within  12  months  from 
diagnosis).  Ethnic  background  of  the  subjects  was  not 
available  for  consideration  in  the  analysis,  and  indetermi¬ 
nate  colitis  patients  were  analyzed  only  for  other  meta¬ 
data  and  not  for  IBD  diagnosis.  Diagnosis  of  CD,  UC,  or 
indeterminate  colitis  was  made  by  endoscopic,  patholo¬ 
gic,  or  radiographic  findings  according  to  the  criteria  of 
the  National  Institute  of  Diabetes  and  Digestive  and 
Kidney  Diseases  (NIDDK)  IBD  Genetics  Consortium. 
OSCCAR  research  protocols  were  reviewed  and  approved 
by  three  institutional  review  boards  (Lifespan  (#0214-07), 
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Figure  6  Proposed  metabolic  roles  of  the  gut  microbiome  in  IBD.  Host-mediated  processes  (biue  text)  create  an  environment  of  oxidative 
stress  in  the  intestine,  which  is  more  favorabie  to  Enterobacteriaceae  (increased  abundance)  than  to  ciades  IV  and  XiVa  Clostridia  (decreased 
abundance).  This  study's  inferred  IBD  metagenomes  include  broadly  increased  oxidative  metabolism,  decreased  SCFA  production,  and  increased 
mucin  degradation  relative  to  healthy  subjects.  These  processes  all  occur  within  microbes  and  rely  on  transport  of  small  molecules  to  and  from 
the  lumen.  The  resulting  tissue-destructive  environment  provides  nutrients  such  as  nucleotides  and  amino  acids,  which  allow  for  increased 
growth  of  auxotrophic  'specialists'.  Bacterial  ciades  of  interest  are  indicated  in  orange,  bacterially  mediated  processes  increased  in  IBD  in  red,  and 
processes  that  decrease  in  green.  Metabolic  pathways  differential  in  our  IBD  communities  are  contained  in  blue  boxes.  GSH  and  GSSG  indicate 
reduced  and  oxidized  forms  of  glutathione.  LPS,  lipopolysaccharide;  NAG,  N-acetyl  galactosamine. 


the  Partners  Human  Research  Committee  (#2007-P- 
001705),  and  the  Program  for  the  Protection  of  Human 
Subjects/Mount  Sinai  School  of  Medicine  (#11-01479)), 
and  all  experiments  adhered  to  the  regulations  of  these 
review  boards.  Informed  consent  and  HIPAA  (Health 
Insurance  Portability  and  Accountability  Act)  authoriza¬ 
tion  were  obtained  from  each  subject  prior  to  study  parti¬ 
cipation.  Individuals  diagnosed  with  IBD  prior  to  the 
study  start  date,  pregnant  women,  those  unwilling  to  pro¬ 
vide  informed  consent  for  study  participation,  and  those 
who  were  prisoners  at  the  time  of  diagnosis  were  not  per¬ 
mitted  to  enroll. 

The  Prospective  Registry  in  IBD  Study  at  MGH 
(PRISM)  is  a  referral  center-based,  prospective  cohort  of 
IBD  patients.  Enrollment  began  1  January  2005.  Patients 


aged  18  years  and  older  with  a  diagnosis  of  CD  or  UC 
based  upon  standard  endoscopic,  radiographic,  and  histo¬ 
logic  criteria  were  eligible  to  participate.  Controls  con¬ 
sisted  of  healthy  patients  aged  18  years  and  older,  from 
whom  biopsies  were  obtained  during  colonoscopies  per¬ 
formed  for  screening  purposes. 

Patients  were  excluded  from  the  healthy  volunteer  group 
for  current  acute  illness,  if  awaiting  transplant,  or  if 
chronically  ill  (for  example,  renal  failure,  diabetes,  conges¬ 
tive  heart  failure).  During  routine  colonoscopies,  subjects 
were  offered  the  opportunity  to  donate  biopsy  samples. 
After  sampling,  intestinal  biopsies  were  stored  in  5%  gly¬ 
cerol  at  -80°C  until  DNA  extraction.  Stool  samples  were 
kept  at  4°C  for  less  than  24  h  before  storage  at  -80°C  until 
DNA  extraction.  PRISM  research  protocols  were  reviewed 
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and  approved  by  the  Partners  Human  Research  Commit¬ 
tee  (#2004-P-001067),  and  all  experiments  adhered  to  the 
regulations  of  this  review  board. 

DNA  extractions 

DNA  from  stool  and  biopsy  samples  was  extracted  using 
the  QIAamp  DNA  Stool  Mini  Kit  (Qiagen,  Inc.,  Valencia, 
CA,  USA)  according  to  manufacturer’s  instructions  and  as 
described  previously  [87].  The  manufacturer’s  protocol 
was  altered  to  accommodate  larger  stool  volumes  and  to 
improve  homogenization  using  bead-beating  at  several 
steps:  a)  a  minimum  of  2  ml  of  Buffer  AST  and  300  mg  of 
stool  was  used  in  the  protocol;  b)  a  ratio  of  700  pi  of  Buf¬ 
fer  AST  per  100  mg  of  stool  weight  was  used  for  larger 
volumes  using  no  more  than  1,500  mg  of  stool  and  10.5 
ml  of  Buffer  AST;  c)  following  the  addition  of  Buffer  AST 
to  each  sample  (step  2),  0.70  mm  Garnet  Beads  (MO  BIO 
Laboratories,  Inc.,  Carlsbad,  CA,  USA)  were  added  to  the 
suspension  and  vortexed  for  10  seconds;  d)  a  second  bead¬ 
beating  was  performed  following  the  heating  of  the  sus¬ 
pension  (step  3)  in  0.1  mm  Glass  Bead  Tubes  (MO  BIO 
Laboratories,  Inc.),  and  vortexed  for  10  minutes. 

Amplification  and  454  sequencing  of  the  16S  gene 

The  16S  gene  dataset  consists  of  454  FLX  Titanium 
sequences  spanning  the  V3  to  V5  variable  regions. 
Detailed  protocols  used  for  16S  amplification  and 
sequencing  are  available  on  the  Human  Microbiome 
Project  Data  Analysis  and  Coordination  Center  website 
[88].  In  brief,  genomic  DNA  was  subjected  to  16S 
amplifications  using  primers  designed  incorporating  the 
FLX  Titanium  adapters  and  a  sample  barcode  sequence, 
allowing  directional  sequencing  covering  variable  regions 
V5  to  partial  V3  (primers:  357F  5’-CCTACGGGAGG- 
CAGCAG-3’  and  926R  5’  CCGTCAATTCMTTTRAGT- 
3’).  PCR  mixtures  (25  pi)  contained  10  ng  of  template, 
lx  Easy  A  reaction  buffer  (Stratagene,  La  Jolla,  CA, 
USA),  200  mM  of  each  dNTP  (Stratagene),  200  nM  of 
each  primer,  and  1.25  U  AccuPrime  hifi  cloning  enzyme 
(Invitrogen,  Carlsbad,  CA,  USA).  The  cycling  conditions 
for  the  V3-V5  consisted  of  an  initial  denaturation  of  95° 
C  for  2  minutes,  followed  by  25  cycles  of  denaturation 
at  95°C  for  40  s,  annealing  at  50°C  for  30  s,  extension  at 
72°C  for  5  minutes  and  a  final  extension  at  72°C  for  7 
minutes.  Amplicons  were  confirmed  on  1.2%  Flash  Gels 
(Lonza,  Rockland,  ME,  USA),  purified  with  AMPure  XP 
DNA  purification  beads  (Beckman  Coulter,  Danvers, 
MA,  USA)  according  to  the  manufacturer,  and  eluted  in 
25  pi  of  lx  low  TE  buffer  (pH  8.0).  Amplicons  were 
quantified  on  Agilent  Bioanalyzer  2100  DNA  1000  chips 
(Agilent  Technologies,  Santa  Clara,  CA,  USA)  and 
pooled  in  equimolar  concentration.  Emulsion  PCR  and 
sequencing  were  performed  according  to  the  manufac¬ 
turer’s  specifications. 


Processing  sequencing  samples 

Sequences  were  processed  in  a  data  curation  pipeline 
implemented  in  MOTHUR  [89],  which  removed 
sequences  from  the  analysis  if  they  were  less  than  200 
nucleotides  or  greater  than  600  nucleotides,  had  a  low 
read  quality  score  (<  25),  contained  ambiguous  charac¬ 
ters,  had  a  non-exact  barcode  match,  or  had  more  than 
4  mismatches  to  the  reverse  primer  sequences  (926R). 
Remaining  sequences  were  assigned  to  samples  based 
on  barcode  matches,  and  barcode  and  primer  sequences 
were  then  trimmed.  Chimeric  sequences  were  identified 
using  the  ChimeraSlayer  [90]  algorithm,  and  reads  were 
classified  with  the  MSU  RDP  classifier  v2.2  [91]  using 
the  taxonomy  maintained  at  the  Ribosomal  Database 
Project  (RDP  10  database,  version  6).  Sequencing  depth 
after  processing  averaged  2,860  (standard  deviation 
1,730)  reads  per  sample. 

Metagenome  inference  from  microbiome  composition 

To  construct  an  approximate  gene  catalog  for  each  sam¬ 
ple  community,  we  used  the  gene  content  of  1,119  KEGG 
reference  genomes  to  infer  the  approximate  gene  content 
of  our  detected  phylotypes.  We  first  matched  the  Eas- 
tTree  GreenGenes  (GG)  phytogeny  [92]  annotated  with 
these  KEGG  genomes’  organisms  against  the  RDP  taxon¬ 
omy  used  for  phylotyping.  Each  clade  in  the  RDP  taxon¬ 
omy  was  mapped  to  the  clade  within  the  GG  phytogeny 
that  maximized  the  Jaccard  index  of  overlapping  named 
descendant  genomes.  That  is,  each  genus-level  phylotype 
was  assigned  to  the  GG  clade  containing  the  most  gen¬ 
omes  from  that  genus  and  fewest  from  other  genera. 
Higher-level  clades  continued  this  pattern  using  the  Jac¬ 
card  index  as  an  optimality  criterion.  The  gene  contents 
for  ancestral  clades  were  then  reconstructed  across  the 
GG  tree,  beginning  with  each  reference  genome  (tree 
leaf)  summarized  as  a  vector  of  KEGG  ortholog  (KO) 
[70]  copy  numbers  (0,  1,  or  multiple  copies  of  the  gene 
annotated  within  the  genome).  Gene  contents  of  each 
parent  GG  clade  g  were  calculated  by  averaging  all  des¬ 
cendant  genomes’  h  KO  vectors,  with  weight  w(g,  h) 
inversely  exponential  to  phylogenetic  distance: 

1=  u>(g,h)h/  Y, 

hedescendants{g)  hedescendants{g) 

wig,  h)  = 

for  GG  tree  nodes  g  and  h  separated  by  phylogenetic 
branch  length  distig,  h)  and  annotated  with  KO  genome 
vectors  g  and  h . 

Using  this  vector  representation  of  genomes,  the 
abundance  of  an  individual  gene  family  (KO)  i  in  a  com¬ 
munity  due  to  the  presence  of  a  specific  phylotype  g  is 
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the  product  of  the  corresponding  gene  count  g[/]  and 
the  measured  abundance  of  phylotype  g.  Therefore,  the 
total  relative  abundance  of  each  KO  was  estimated  for 
each  sample  by  adding  the  individual  contributions  of 
all  phylotypes  present  in  the  sample.  Using  this  method, 
we  inferred  the  functional  composition  for  each  sampled 
community.  The  inference  process's  accuracy  was  vali¬ 
dated  by  comparing  inferred  KO  abundances  in  16S 
datasets  from  the  Human  Microbiome  Project  with  their 
metagenomically  sequenced  counterparts  (Additional 
file  15). 

Metabolic  pathway  reconstruction 

Inferred  per-community  gene  (KO)  abundances  were 
subsequently  reconstructed  into  microbial  pathway  rela¬ 
tive  abundances  using  HUMAnN,  the  Human  Micro¬ 
biome  Project  metabolic  reconstruction  pipeline  [71]. 
KOs  were  grouped  into  pathways  represented  as  gene 
sets  using  HUMAnN,  which  chooses  pathways  by  maxi¬ 
mum  parsimony  using  MinPath  [93]  and  computes  each 
pathway’s  relative  abundance  as  a  smoothed  average 
over  all  genes  within  it,  taking  into  account  outliers  and 
gap  filling.  We  ran  HUMAnN  three  times  to  reconstruct 
three  complementary  types  of  pathways  from  these 
genes:  small  metabolic  modules  (using  KEGG’s  conjunc¬ 
tive  normal  form  logic),  large  metabolic  pathways,  and 
Gene  Ontology  terms  (using  annotation-to-KO  map¬ 
pings  from  nine  well-characterized  KEGG  microbes: 
ban,  cje,  cpe,  eco,  nse,  pae,  see,  son,  and  vch).  Eor  each 
of  these  three  types  of  pathway,  HUMAnN  input  the 
inferred  relative  abundances  of  all  genes  in  each  sample, 
and  output  the  relative  abundances  of  pathways  within 
the  sample.  Subsequent  analysis  handled  these  sample¬ 
by-pathway  relative  abundances  in  the  same  manner  as 
sample-by-clade  microbial  abundances. 

Significant  associations  of  microbial  clades  and  pathways 
with  sample  metadata 

Inverse  Simpson  diversity,  Chaol  richness  (using  the  R 
fossil  package),  and  Pielou  evenness  were  calculated  for 
clade  abundance,  KEGG  pathway  and  module  abun¬ 
dance,  and  Gene  Ontology  term  abundance  [94-97]. 
Next,  data  were  pre-processed  for  quality  control  before 
modeling.  Glinical  metadata  were  removed  when  more 
than  10%  of  data  were  missing,  or  when  they  did  not  vary 
in  value  over  the  available  samples.  Clades,  pathways,  and 
features  of  very  low  abundance  (<  0.001  in  >  90%  of  sam¬ 
ples)  and  feature  outliers  outside  of  the  lower  or  upper 
outer  fence  (3x  interquartile  range)  were  removed.  Miss¬ 
ing  data  were  imputed  for  significance  testing  with  the 
mean  abundance  of  the  sample;  missing  factor  metadata 
were  imputed  with  a  ‘NA’  factor  level  using  the  na.gam. 
replace  function  from  the  R  package  [98].  Unless  stated 


otherwise,  all  subsequent  analyses  and  calculations  were 
performed  using  these  processed  data.  After  processing, 
228  and  231  samples  passed  quality  control  for 
clade  abundance  and  functional  abundance  analyses, 
respectively. 

Finally,  clades  and  functions  were  tested  for  statisti¬ 
cally  significant  associations  with  clinical  metadata  of 
interest  by  using  a  novel  multivariate  algorithm.  Each 
clade  (excluding  ecological  measures)  was  normalized 
with  a  variance-stabilizing  arcsine  square-root  transfor¬ 
mation  and  evaluated  with  a  general  linear  model  (in  R 
using  the  glm  package).  Model  selection  for  sparse  data 
was  performed  per  clade  using  boosting  (gbm  package 
[99]).  A  multivariate  linear  model  associating  all  avail¬ 
able  metadata  with  each  clade  independently  was 
boosted,  and  any  metadata  selected  in  at  least  1%  of 
these  iterations  was  finally  tested  for  significance  in  a 
standard  generalized  linear  model.  This  composite 
model  was  thus  of  the  form: 

arcsin(.yy[))  =  ^  PpXi.p  +  £;,  i  =  1, ...,  n 

P 

where  p  are  the  clinical  metadata  selected  from 
boosting. 

Within  each  metadatum/clade  association  indepen¬ 
dently,  multiple  comparisons  over  factor  levels  were 
adjusted  using  a  Bonferonni  correction;  multiple  hypoth¬ 
esis  tests  over  all  clades  and  metadata  were  adjusted  to 
produce  a  final  Benjamin!  and  Hochberg  false  discovery 
rate  [100].  Unless  otherwise  indicated,  significant  associa¬ 
tion  was  considered  below  a  q-value  threshold  of  0.25; 
the  KEGG  pathway  sulfur  metabolism  (ko00920)  had  an 
average  q-value  of  0.26  for  association  with  Crohn’s  dis¬ 
ease.  Multiple  factor  analysis  was  performed  to  visualize 
the  relationships  within  heterogeneous  factor  data  as  well 
as  with  a  select  group  of  taxa  found  to  be  significantly 
associated  with  metadata  (using  the  FactoMineR  R  pack¬ 
age  [101]).  Total  abundances  and  significant  associations 
between  metadata,  taxa,  and  functions  are  listed  in  Addi¬ 
tional  files  1  and  11. 

Sequence  alignment  for  segmented  filamentous  bacteria 

To  determine  whether  SFB  were  present  in  samples,  three 
sequences  of  SFB  (X80834,  X87244,  and  X77814)  from 
three  species  (chicken,  rat,  and  mouse)  were  aligned  by 
blastn,  using  both  a  20  and  15  seed  word.  No  sequences 
were  found  with  >  95%  identity  over  an  alignment  length 
of  at  least  100  nucleotides.  The  average  sequence  length 
from  the  study  was  435  nucleotides. 

Shotgun  metagenomic  sequencing 

To  provide  internal  validation  of  inferred  microbial 
community  gene  and  pathway  compositions,  stool  DNA 
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from  seven  healthy  controls  and  four  CD  patients  was 
subjected  to  metagenomic  shotgun  sequencing.  Libraries 
were  constructed  with  the  Illumina  Nextera  XT  kit  and 
sequenced  on  an  Illumina  MiSeq  using  2  x  150  bp 
paired-end  sequencing  according  to  the  manufacturer’s 
instructions.  This  resulted  in  sequencing  depths  ranging 
from  3.9  to  270  meganucleotides,  average  119  meganu¬ 
cleotides,  from  which  microbial  community  function 
was  determined  with  HUMAnN  [71]  as  described  above. 

Sequence  accession  numbers  and  availability 

Sequences  generated  in  this  study  are  publicly  available 
(NCBI  BioProject  ID  numbers  82111  and  175224). 

Additional  material 


Additional  file  1:  Taxa  significantly  associated  with  IBD  status  or 
subject  metadata  using  a  boosted  general  linear  model  A 

multivariate  analysis  was  performed  to  associate  each  microbial  clade 
with  a  sparse  selection  of  disease  status  and  clinical  metadata  {selected 
through  boosting;  see  Materials  and  methods).  All  clades  and  metadata 
in  these  associations  are  given  with  nominal  P-values  from  the 
muitivariate  linear  model  and  with  Benjamin!  and  Hochberg  (BH) 
corrected  false  discovery  rate  (q-values)  up  to  a  threshold  of  0.25.  In  this 
and  all  other  supplemental  tables,  blank  spaces  indicate  values  that  were 
not  significant  but  are  shown  for  comparison  with  related  significant 
data. 

Additional  file  2:  Effects  of  biogeography  on  gut  microbiome 
composition  differentiates  stool  and  biopsy  communities  The 

composition  of  phyla  stratified  by  biopsy  location  or  fecal  sample  origin 
mainly  differentiates  stool  and  biopsy  communities.  Sample  count  per 
location  is  indicated  in  parentheses.  Biopsy  locations  (above)  do  not 
substantially  differ  in  composition,  while  biopsies  compared  to  stool 
(below)  differ  significantly  in  all  phyla. 

Additional  file  3:  Univariate  analysis  of  associations  between 
microbial  composition  and  biopsy  location  A  univariate  analysis  for 
associations  between  taxa  and  biopsy  sites  was  conducted  using  LEfSe 
[102]  considering  the  six  regions  annotated  for  these  samples:  1) 
terminal  ileum  (Tl),  2)  cecum,  3)  left  colon,  4)  transverse  colon,  5)  right 
colon,  and  6)  sigmoid  colon  and  rectum,  (a)  Relatively  few  clades  were 
strongly  associated  with  biopsy  locations,  and  these  tended  to  mirror 
expected  intestinal  pH  and  the  clades  described  here  as  particularly 
affected  by  disease-linked  inflammation,  (b-g)  Abundant  major  clades, 
including  the  Firmicutes  (b),  showed  extremely  modest  variations  with 
intestinal  region,  driven  by  specific  members  depleted  in  low-pH  regions, 
including  Roseburia  (c)  (high  in  the  left  and  sigmoid  colon), 
Ruminococcaceae  (d),  and  to  a  lesser  degree  Alistipes  (e).  Clades  enriched 
in  low  pH  regions  included  Fusobacterium  (f)  (high  in  Tl  and  right  colon) 
and  Enterobacteriales  (g)  (particularly  in  Tl). 

Additional  file  4:  Locations  of  patient  biopsies  Distribution  of  biopsy 
samples  available  for  this  study  as  classified  by  the  OSCCAR  and  PRISM 
cohort  collection  protocol. 

Additional  file  5:  Univariate  analysis  of  associations  between 
microbial  composition  and  gender.  A  univariate  test  for  associations  of 
subject  gender  with  microbial  clades  was  conducted  using  LEfSe  [102], 
resulting  in  few  and  weak  associations  concordant  with  previous  studies 
[5].  Here,  Clostridium  and  the  Streptococcaceae  were  weakly  associated 
with  gender  at  P  <  0.05,  but  did  not  remain  significant  at  P  <  0.1. 

Additional  file  6:  Bifidobacterium  genus  abundance  decreases 
significantly  with  age.  The  association  of  Bifidobacterium  abundance 
with  disease  status  and  clinical  metadata  (including  age)  was  determined 
to  be  significant  in  these  data  using  a  sparse  general  linear  model.  Clade 
abundances  were  transformed  with  the  arcsine  square-root 
transformation  for  proportional  data  (y-axis).  Size  of  effect,  standard 


deviation,  P-value  (p)  and  Benjamin!  and  Hochberg  false  discovery  rate 
(q)  are  shown  in  parentheses,  and  the  line  of  best  fit  in  green. 

Additional  file  7:  Escherichia/Shigella  abundance  is  significantly 
decreased  in  mesalamine-treated  subjects  The  association  of  these 
genera  (indistinguishable  by  16S  rRNA  gene  sequencing)  with  disease 
status  and  clinical  metadata  (including  mesalamine  treatment)  was 
determined  to  be  significant  using  a  sparse  general  linear  model  (see 
Materials  and  methods).  Clade  abundances  were  transformed  with  the 
arcsine  square  root  transformation  for  proportional  data  and  are  plotted 
along  the  y-axis  as  two  notched  box  plots  (samples  without  and  with 
mesalamine  use).  Size  of  effect,  standard  deviation,  P-value  (p)  and  q- 
value  (q)  are  shown  in  parentheses. 

Additional  file  8:  Stratification  of  clades  associated  with  IBD  status 
by  sample  biogeography.  Fifteen  microbial  clades  were  significantly 
associated  specifically  with  IBD  status  (q  <  0.25)  using  a  multivariate 
linear  model  incorporating  clinical  metadata  (see  Materials  and  methods). 
Although  this  model  putatively  asserts  that  this  association  holds 
regardless  of  sample  origin  {biopsy  or  stool),  we  verified  this  by 
stratifying  each  clade's  abundance  by  sample  type,  stool  (1)  or  biopsy  (0). 
Green  coloring  indicates  that  a  clade's  abundance  was  significantly 
reduced  In  IBD  using  the  full  model,  red  increased.  These  trends  are 
uniformly  preserved  after  explicit  stratification  by  stool  versus  biopsy 
sample  origins. 

Additional  file  9:  Univariate  associations  of  microbial  composition 
with  biopsy  location.  Results  of  a  LEfSe  analysis  of  the  six  location 
categories  available  for  biopsies  in  this  study,  excluding  two 
anatostamosis  samples. 

Additional  file  10:  Covariation  of  microbial  community  function  in 
IBD  with  treatment  environment,  biometrics,  and  disease  subtype. 

Fecal  and  biopsy  samples  from  231  IBD  patients  and  healthy  controls  are 
plotted  as  squares  (iCD)  or  circles  and  colored  by  disease  status.  Axes 
show  the  first  two  components  of  overall  variation  as  determined  by 
multiple  factor  analysis  (see  Materials  and  methods),  Clinical  and 
environmental  covariates  are  shown  in  bold,  while  individual  microbial 
functions  (Gene  Ontology  terms)  are  italicized.  Covariation  patterns  are 
similar  to  those  determined  using  microbial  abundance  (Figure  1). 

Additional  file  11:  KEGG  pathways  significantly  associated  with  IBD 
status  or  subject  metadata  using  a  boosted  general  linear  model  A 

multivariate  analysis  was  performed  to  associate  each  pathway  with  a 
sparse  selection  of  disease  status  and  clinical  metadata  (selected  through 
boosting;  see  Materials  and  methods).  All  pathways  and  metadata  in 
these  associations  are  given  with  nominal  P-values  from  the  multivariate 
linear  model  and  with  Benjamin!  and  Hochberg  (BH)  corrected  false 
discovery  rate  (q-values)  up  to  a  threshold  of  0.25. 

Additional  file  12:  KEGG  metabolic  modules  significantly  associated 
with  IBD  status  or  subject  metadata  using  a  boosted  general  linear 
model.  A  multivariate  analysis  was  performed  to  associate  each 
metabolic  module  with  a  sparse  selection  of  disease  status  and  clinical 
metadata  (selected  through  boosting;  see  Materials  and  methods).  Each 
module  and  metadata  in  these  associations  is  given  with  nominal  p- 
values  from  the  multivariate  linear  model  and  with  Benjamin!  and 
Hochberg  (BH)  corrected  false  discovery  rate  (q-values)  up  to  a  threshold 
of  0.25. 

Additional  file  13:  Gene  Ontology  terms  significantly  associated 
with  IBD  status  or  subject  metadata  using  a  boosted  general  linear 
model.  A  multivariate  analysis  was  performed  to  associate  each  Gene 
Ontology  term  with  a  sparse  selection  of  disease  status  and  clinical 
metadata  (selected  through  boosting;  see  Materials  and  methods).  Each 
term  and  metadata  in  these  associations  is  given  with  nominal  P-values 
from  the  multivariate  linear  model  and  with  Benjamini  and  Hochberg 
(BH)  corrected  false  discovery  rate  (q-values)  up  to  a  threshold  of  0.25. 
Additional  file  14:  Shotgun  metagenomic  sequencing  validates 
predicted  microbial  metabolic  trends  in  a  subset  of  healthy  and  CD 
microbiomes.  A  subset  of  11  stool  samples  for  which  microbial  DMA 
was  available  were  subjected  to  shallow  metagenomic  sequencing  using 
the  MiSeq  platform  (150-nucleotide  paired-end  reads)  averaging  119 
meganucleotides  per  sample,  (a)  Of  the  seven  microbial  metabolic 
modules  highlighted  In  Figure  5,  six  retained  the  same  over-  or  under- 
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abundance  trend  predicted  from  165  sequencing  in  this  subset,  with  the 
seventh  (cobalamin  biosynthesis)  falling  below  the  limit  of  detection,  (b) 
Six  additional  metabolic  modules  of  interest  with  significant  differences 
in  the  full  IBD  dataset  retained  the  trend  expected  with  CD  in  this 
subset,  including  depletion  of  glycolysis  processes  and  enrichment  for 
bacterial  secretion  systems. 

Additional  file  15:  Correlation  of  microbial  gene  families  estimated 
from  16S  gene  pyrosequencing  and  whole-genome  shotgun 
sequencing  data.  Ancestral  state  reconstruction  was  used  to  infer 
metagenomes  using  16S  gene  pyrosequencing  of  samples  from  multiple 
body  sites  from  the  Human  Microbiome  Project  (see  Materials  and 
methods).  The  relative  abundance  of  KOs  inferred  from  165  sequencing 
and  measured  from  paired  whole-community  genome  sequencing 
samples  were  correlated  (5pearman  rank  correlation)  and  plotted  per 
body  site.  Each  box  plot  shows  the  distribution  of  the  correlation  of 
relative  KO  abundance  from  165  and  whole-genome  sequencing;  specific 
sample-pair  correlations  are  plotted  as  dots.  Median  correlation  for 
Human  Microbiome  Project  stool  samples  is  0.75  for  an  average  n  =  75 
per  body  site.  As  each  correlation  is  calculated  over  approximately  5,400 
KOs,  correlation  values  above  0.59  are  significant  at  a  Bonferroni- 
corrected  P  <  0.05. 
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